-
Notifications
You must be signed in to change notification settings - Fork 233
optionally allow static node name prefixes #871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Instances of nodetool generate random node name suffxes to facilitate running multiple simultaneous calls in parallel. However, each time nodetool connects to the target node, a new atom is created on the latter. If this happens frequently and/or long enough, it will eventually crash the node as it hits the atom table limit. As a workaround, if the caller can guarantee calls are serialized and isolated in time, defining an env variable $NODETOOL_NODE_PREFIX will create identical atoms for node name prefix, thus avoiding generation of new atoms. The proposed change is complimentary to erlware#868, aiming to address the issue, reported by one of our customers, in which a riak node hit the atom table limit (yes, all of 1M+ entries) and crashed. A postmortem showed the table filled with `[email protected]`, accumulated over a period of time resulting from calls to `riak admin status` every 5 min. Note that I did not attempt to do any changes that may need to be done, to the same effect, in extended_bin_windows, as it's not straightforward for me which they would be (my knowledge of scripting in Windows is some 30 year old).
|
Thanks. As with the other this issue is resolved when running on 23+ but I guess this solution is simple enough that I can't say no to the change. This code will go away when 23.1 is the oldest OTP we support, but that won't be for a long time. |
priv/templates/extended_bin
Outdated
| # are entirely sequential. | ||
| if [ x != x$NODETOOL_NODE_PREFIX ]; then | ||
| echo $NODETOOL_NODE_PREFIX | ||
| if [ x != x"$NODETOOL_NODE_PREFIX" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not check if $NODETOOL_NODE_PREFIX is defined and not empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong opinions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then please switch to using -z. That is how these are checked in the rest of the script, so good to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using -z now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsloughter Sorry for potentially dumb question, now that you have approved the PR even though some tests are failing, what do we do next? I'm asking this because there's an accompanying change I wanted to post to basho/riak, which would eventually allow operators to do NODETOOL_NODE_PREFIX=static42 riak admin status and never lose sleep over atom table limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotta figure out why the tests fail so I can merge it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's sort of no reason for it; I see tests failing only on windows and this change being on the non-windows stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any progress on this? Much as I would be happy to help, I have no windows knowledge myself. Can anybody help?
|
Oh, facepalm on windows it tries to compile rebar3 and I don't think that work son otp-19 anymore. I say we just remove windows otp-19 (and may as well remove otp-20) while adding otp-24. Sorry its not related to your actual PR, but do you mind slipping that in here as another commit? It may be faster than me getting a new PR created, reviewed and merged with the change. |
|
Just change |
|
hah. Gleam's stuff for windows doesn't have 24 yet. |
no otp-24 for windows yet
Instances of nodetool generate random node name prefixes to facilitate running multiple simultaneous calls in parallel. However, each time nodetool connects to the target node, a new atom is created on the latter. If this happens frequently and/or long enough, it will eventually crash the node as it hits the atom table limit. As a workaround, if the caller can guarantee calls are serialized and isolated in time, defining an env variable $NODETOOL_NODE_PREFIX will create identical atoms for node name prefix, thus avoiding generation of new atoms.
The proposed change is complimentary to #868, aiming to address the issue, reported by one of our customers, in which a riak node hit the atom table limit (yes, all of 1M+ entries) and crashed. A postmortem showed the table filled with
[email protected], accumulated over a period of time resulting from calls toriak admin statusevery 5 min.Note that I did not attempt to do any changes that may need to be done, to the same effect, in extended_bin_windows, as it's not straightforward for me which they would be (my knowledge of scripting in Windows is some 30 year old).